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Abstract 

o 

£SJ | In this work we investigate the behavior of the minimal rate needed in order to guarantee a 

given probability that the distortion exceeds a prescribed threshold, at some fixed finite quantization 
block length. We show that the excess coding rate above the rate-distortion function is inversely 
P_h . proportional (to the first order) to the square root of the block length. We give an explicit expression 

| for the proportion constant, which is given by the inverse Q-function of the allowed excess 

distortion probability, times the square root of a constant, termed the excess distortion dispersion. 
This result is the dual of a corresponding channel coding result, where the dispersion above is 
the dual of the channel dispersion. The work treats discrete memoryless sources, as well as the 
quadratic-Gaussian case. 

q ■ I. Introduction 



> 



Rate-distortion theory [1] tells us that in the limit of large block- length n, a discrete mem- 
oryless source (DMS) with distribution p can be represented with some average distortion 
qq '. D by a code of any rate greater than the rate-distortion function (RDF) 

R(p,D)= min I(p,W), (1) 

(N ■ W:E p>w [d(X,X)\<D 

where d(x, x) is the distortion measure, W(x|a;) is any channel from the source to the 
reproduction alphabet and /(•, •) denotes the mutual information. However, beyond the 
expected distortion, one may be interested in ensuring that the distortion for one source 
block is below some threshold. To that end, we define an excess distortion event S(D) as 



X 



£(£>)^{d(x,x) >D}, (2) 

where d(x, x) = - Yli=i &i) is the distortion between the source and reproduction 
words x and x. 

A natural question to ask is how fast can the probability of such event be made to decay as 
a function of the block length. An asymptotic answer is given by Marton's excess distortion 
exponent for the best code of rate R, 

lim --log Pr {£(£>)} = min ZXqllp) = F(R, p, D), (3) 

n->oo n q:R(q,D)>R 

assuming the limit exists. -D(-||-) is the divergence between the two distributions [] Intuitively 
speaking, this result means that, asymptotically, the error probability is governed by the first- 
order empirical statistics of the source sequence; if the sequence happens to be "too rich" 
to be quantized with rate R, en error (excess distortion event) will occur. 



'Throughout the paper logarithms are taken with the natural base e and rates are given in nats. 



We are interested in the following related question: for a given excess distortion probabil- 
ity e, what is the optimal (minimal) rate required to achieve it? This question is unanswered 
by Marton's exponent, and even the asymptotical behavior of the optimal rate is unknown. 

A similar question can be asked in the context of channel coding: for a given error 
probability e, what is the maximal communication rate that can be achieved. Again, this 
question is unanswered by the channel error exponent fl3]. The asymptotics of the rate 
behavior was first studied in the 1960's [4] using the normal approximation. This result was 
recently tightened and extended to the Gaussian channel, along with nonasymptotic results, 
in a comprehensive work by Polyanskiy et al. [|3). In channel coding the maximal rate that 
can be achieved over a channel W is approximately given by 



Rc*C(W)-y/-^-Q-\e), (4) 

where C(W) is the channel capacity, Q is the complementary Gaussian cumulative distri- 
bution function, and the quantity V(VK) is a constant that depends on the channel only, 
termed the channel dispersion. See [5] for details and more refinements of ©. 

Our main result is the following. Suppose the source p is to be quantized with distortion 
threshold D, and a fixed probability for excess distortion e > 0. Then the minimal rate R 
needed for quantization in blocks of length n is given by 



R^R(p,D) + ^^^Q-\e), (5) 

where V(p, D) is a constant which we call the excess distortion dispersion, given in detail 
later on. We show that © holds for any DMS under some smoothness conditions on 
R(p, D), and for a Gaussian source with quadratic distortion measure, see Theorems \T\ and 
[2] respectively. 

It is worth noting that that there is a large body of previous work regarding the redundancy 
of lossy source coding in related setting. However, these works are mostly concerned with 
two questions: the behavior of the word-length of variable-rate codes where the distortion 
should always be below some threshold (a.k.a. D-semifaithful codes) [6], or the average 
excess distortion of fixed-rate codes; see e.g. [ED, [IE] and the references therein. We consider 
the excess-distortion probability, thus bridging between these works and the concepts of 
excess-distortion exponent and dispersion discussed above. In this context, the work by 
Kontoyiannis (8|] is of special interest, since it introduces a constant which equals V(p, D), 
see in the sequel. 

II. Main Result for Discrete Memoryless Sources 

Let the source X be drawn from an i.i.d. distribution p over the alphabet X = {1, L}, 
and let the reproduction alphabet be X = {1, ...,K}. The distribution p can be seen as a 
vector p = \p x , ■■■,Pl] T £ Vl, where Pi = Pr(X = i) and Vl is the probability simplex: 

V L ± j q G R% > CM G {1..L}; = 1 j • (6) 

Let d : X x X — > R + denote a general nonnegative single-letter distortion measure, bounded 
by some finite D max . Denote the rate distortion function for the source p and the distortion 



measure d(-, •) at some level D by R(p, D). Whenever this function is differentiable w.r.t. 
its coordinates p^ define the partial derivatives by 



R'( 



d 



(7) 



q=p 



Note that R'(i) implicitly depends on p and D as well. For a random source symbol X, we 
may look at R'(i) as the values that a random variable R'(X) takes. Also note that in order 
to define the derivative, we extend the definition of the RDF R(p, D) to general vectors 
in (0, 1) L (cf. (6l Theorem 2]). In any case, we will only be interested in the value of this 
derivative for values of p within the simplex, i.e. that represent probability distributions. 

Let x G X n and x G X n denote the source and reproduction words respectively. Recalling 
©, let R Pi D,s( n ) be the optimal (minimal) code rate at length n s.t. the probability of an 
excess distortion event S(D) is at most e. 

It is known that i?p,£>, e (n) — > R(p, D) as n — > oo. This can be deduced e.g. by Marton's 
excess distortion exponent [2J. Our main result quantifies the rate of this convergence. 

Theorem 1: A DMS with probability p is to be quantized with distortion threshold D, 
block length n and excess distortion probability e. Assume that i?(q, D) is differentiable 
w.r.t. D and twice differentiable w.r.t. q in some neighborhood of (p, D). Then 



R 



P ,D,£ 



n 



n 



logn 



71 



(8) 



where V(p, D) is the excess distortion dispersion, given by 

L 

V(p, D) 4 Vm[R'(X)} = X)Pi(^(i)) 2 

i=l 



-i 2 



x=l 



(9) 



This result is closely related to the following central-limit theorem (CLT) result of [(HI. If 
we allow a code with variable rate r(x) = Z(x)/n, where /(x) is the length of the codeword 
needed to describe the source word x, then for the best code: 



r(x) = R(p, D) + + O 



ii 



logn 



71 



where {G n } converge in distribution to a Gaussian random variable of variance ^(p,/)^ 
If G n are exactly Gaussian, and then we truncate this variable-length code by assuming an 
excess-distortion event at each time that the length is over nR, then the excess distortion 
probability exactly satisfies the achievability bound of Theorem \T\ However, this is not 
immediate, as one needs to take into account the rate of convergence of the sequence 
{G n }. 

We follow a different direction, which is closer in spirit to the derivation of the excess 
distortion exponent in [0. Specifically, we show that the 0(1/ y/n) redundancy term comes 
only from the probability that the source will produce a sequence whose type is too complex 
to be covered with rate R. 

The proof is based on the method of types. We adopt the notation of Csiszar and Korner 
||9l : The type of a sequence x G X n is the vector P x G Vl whose elements are the relative 
frequencies of the alphabet letters in X. T n denotes all the types of sequences of length n. 



2 The variance has a different expression in (8], we show in Section UlI-BI that the forms are equivalent. 



We say that a sequence x has type q G T n if -P x = q. The type class of the type q G l~ n , 
denoted T q , is the set of all sequences x G ^ n with type q. 

For a reconstruction word x G A\ we say that x is D-covered by x if <i(x, x) < D. 

Proposition 1 (Type covering): Let q G T n with a corresponding type class T q . Let 
A(q,C,D) be the intersection of T q with the set of source sequences x G <Y n which 
are D-covered by at least one of the words in a codebook C with rate R (i.e. \C\ — e nR ). 
Then: 

1) If |<9i?(q, D)/dD\ is bounded in some neighborhood of q, then there exists a codebook 
C q that completely D-covers T q (i.e. A(q, C q ,-D) = T q ), where for large enough n, 

- log |C| = R < i?(q, D) + Ji— , (10) 

n n 

where Ji = J±(L,K) is a constant. 

2) For any type q6T„ s.t. i?(q, D) > R, the fraction of the type class that is D-covered 
by any code with rate R is bounded by 



\A(q,C n ,D)\ 

— < exp < — n 



IT I 



R( q ,D)-R+J 2 hgn 



n 



(ID 



where J 2 = J 2 (-^,i^) is a constant. 
The first part of this proposition is a refinement of Berger's type-covering lemma [HJ, 
found in [6J. The second part is a corollary of [7 , Lemma 3]. Both parts of the proposition 
are stronger versions than needed in [2], due to the non-exponential treatment of the excess 
distortion probability^] Equipped with this, the missing ingredient is an analysis of the 
relation between the rate R and the probability of the source to produce a type which 
requires a description rate higher than R. It is given in the following lemma which is 
proved in Section fVl 

Lemma 1 (Rate Redundancy): Consider a DMS p and a distortion threshold D. Assume 
that i?(p, D) is differentiable w.r.t. D and twice differentiable w.r.t. p at some neighborhood 
of (p,-D). A random source word is denoted by x and its type by P x . Let e be a given 
probability and let AR be chosen s.t. 

Pr{i?(P x , D) - R(p, D) > AR} = e. 

Then, as n grows, 

AR =JYEv Q -i (e)+0 (^), , 12) 

V n \ n J 

where V(p, D) is given by ©. The same holds even if we replace e with e + g n , as long 

Proof of Theorem \Ts Achievability part. 
Let AR > 0. We construct a code C as follows. The code shall consist of the union of 
the codes that cover all the types q G $(n, D, AR), where 

$(n, D, AR) = {q : R(q, D) < R(p, D) + AR} n ft n , (13) 

where fi n = {q: ||p -q|| 2 <L^}. 



3 For the first part, Marton uses Berger's original lemma, while for the second part it is proved that the ratio between 
|T q | and |j4.(q, C, D) \ is upper-bounded by a constant. 



Lemma 2: For a source word x drawn from the p, we have Pr{P x Q n } < 
The proof for this technical lemma is omitted. It can be proved using techniques similar to 
those in [6, Theorem 2]. 

The size of the code is bounded by 

\C\< J2 l C ql < |7"n||Q| < (ri + l) L |C q .|, (14) 

where q* is the largest type class that is covered. 

Since we assumed that P(p, D) is differentiable w.r.t. D at p, the derivative is bounded 
over any small enough neighborhood of p. In particular, it is bounded over Vt n for large 
enough n, thus for all types covered by the codebook. We can thus apply part 1 of 
Proposition Q] and we get a bound on the rate: 

R = - log \C\ <- log(n + 1) + - log \C„* | (15) 
n n n 

<P(p, D) + AR + O O^p) • (16) 

Since we completely cover all the types in D, AR), we have that the probability of 
excess distortion © satisfies 

Pr {£(£>)} = Pr{p x i $(n,D,Ai2)} 

< Pr|p(P x , D) < P(p, D) + AP} + Pr{P x ^ Vt n } (17) 

<Pr|P(P x ,D)<P(p,D) + AP| + — . (18) 

where (fl7l follows from the union bound, and ([TBI is justified by Lemma [2l 

We select AP s.t. the probability for {P(P X , D) > P(p, D) + AP} is exactly £ - fy, 
and get a code with excess distortion probability at most e. By Lemma Q] we have 



AP = W^P^g-^ ) + o^ log " 



n \ n 

and by plugging into (Tj~6b the rate P is bounded by the RHS of ([8]), as required. 
Converse part. 

Let C be a code with rate P, and suppose that its excess distortion probability is e. Our 
goal is to lower bound AP = P — P(p, D). 

Again, the source word is x and its type is P x . The following holds for any 

e = Pr{£(D)} =Pr{£(P>)|P(P x , D) < R + ^} Pr|p(P x , D) < P + ^| 

+ Pr {£(£>) | P(P X ,P>) > P + ^|Pr|p(P x ,P)) > P + ^| 

>Pr{£(P>)|P(P x ,P>) > P + ^}Prjp(P x ,P>) > R + ^Y (19) 

Take a type q G l~ n , and assume that P(q, D) > R + \E'. By the second part of 
Proposition [Q the fraction of the type class T q that is covered by the code C is at most 



exp < — n 



log n 

P(q,D)-P+J 2 



n 



< exp {-n^ + J 2 log n} (20) 



By setting ^ = (J 2 + 1) i2£2i we g e t that the fraction is bounded by 1/n. Since the source 
sequences within a given type are uniformly distributed, we get that the probability of 
covering a sequence from a type that its i?(P x , D) is too high is at most 1/n. We therefore 
have 



(21) 



Pr|i?(T X)J D) > + 
> T ^ T Pr{ J R(T X)J D)> J R+vl>}, 



where the last inequality follows since 

1 - x > for all x G [0, 1/2]. 
We rewrite (12Tb and get that Ai? must satisfy 

£ + - Pt { R ( T ^ D ) - R {P, D ) > Ai? + ^|. (22) 
By Lemma [Hand the fact that * = O (^p), we get 

Afl>y*H Q -. (£) +o(l2|^), (23) 

as required. ■ 

III. Excess Distortion Dispersion: Properties and Evaluation 

A. Differentiability of the RDF 

In the results above, we assumed differentiability of the RDF R(p, D) with respect to 
D (once) and p (twice). In general, the RDF is not differentiable w.r.t. either. However, 
it is differentiable "almost always" in the following sense. Let K'(p,D) be the "effective 
reproduction alphabet size", i.e., the number of reproduction letters of positive probability 
for the channel minimizing (0Q). Then, if K'(p, D) is constant in a neighborhood of D, then 
i?(p, D) is differentiable w.r.t. D and twice differentiable w.r.t. p at that point. 

When keeping p fixed and changing D, such points may represent "jumps" in the excess 
distortion dispersion V(p, D). In these points, we can not specify the exact behavior of the 
excess rate, but careful derivation should verify that it is between V(p, D~) and V(p, D + ). 
However, in the process we will encounter at most L — 2 such points. 

B. Alternative Representations 

The evaluation of the the excess distortion dispersion seems to be a difficult task, as it 
involves derivatives of the RDF w.r.t. the source distribution. However, we have the following 
alternative representations. 

First we connect the dispersion to the excess-distortion exponent ©, much in the same 
way that the channel dispersion constant is related to the channel error exponent; See [2] 
for details on the early origins of this approximation by Shannon. 

Proposition 2: If R(p, D) is differentiable at distortion level D, then 

r & r 1 

V(p,D) 



d 2 

F(R,p,D) 



OR 2 



R=R{p,D) 



The proof, not included in this version, follows by directly considering the exponent 
definition © in the limit of small excess rate. 



We further show equivalence to the variance of the excess rate in (S), which is close in 
spirit to the dispersion as discussed in Section HO 

Proposition 3: If R(p, D) is differentiable at distortion level D, then V(p, D) = Vax[f(X)] 
where 

f(i) = -log£^exp{-A[d(xi,x) - D}}, 

where the expectation is taken according to the reproduction distribution induced by the 
channel minimizing (OQ) for p and D, and A = dR(p, D)/dD at that point. 

This form is especially appealing, since it can also be shown that R(p, D) = E{f(X)}, 
thus presenting the dispersion as a "second-order RDF". The equivalence can be proven by 
starting from the RDF presentation above. Applying ®, 



d L 



V(p 1 D)=Var{ — Y^qifij) 



Va^/(0 + 5> dfij) 



dqi 



q=p 



q=p . 

Straightforward derivation shows that the term to the right of the addition in the last form 
is constant in i, thus it does not effect the variance, as required. 

C. Some Special Cases 

In some cases the evaluation may be simplified, as follows. 
1) Zero distortion. Whenever R(p,0) = H(p), we have 



= -1 - log Pi 

q=p 



Thus, 

V(p,0) = Var{logpJ. (24) 

This is in agreement with the long known lossless dispersion result fl4]. 

2) Difference distortion measure with low distortion. Assume that 

d(x, x) = d([x — x] mod L) = d(z). 

Since we assumed that each source letter has positive probability, there exists some 
A)(p) > s.t. for all D < D the optimum backword channel is x = x + z. The 
RDF is then given by 

R(p,D) = H(p)-H(w z ) (25) 

where w 2 is the maximum-entropy distribution such that E{d(z)} < D [1, Sec. 
4.3.1]. Since this distribution is D -independent as long as D < D (p), we have that 
the second term in (|23T) is fixed in p in a neighborhood of the source distribution. 
Consequently the derivatives only come from the first term, and (|24|) holds for all 

< D < D . 

3) Hamming distortion measure. In this special case of a difference distortion measure, 
the optimum backward channel is modulo-additive also above D , where the modulo 
is taken over a reduced alphabet. Consequently, the dispersion is the variance of the 
logarithm of a normalized smaller-alphabet distribution. 

4) Zero dispersion. The dispersion becomes zero when the source distribution maximizes 
the RDF over all possible source distributions among the input alphabet (thus the rate 



redundancy in Lemma Q] is zero). Note that this is in agreement with the fact that for 
this case the excess-distortion exponent "jumps" from zero to infinity at zero excess 
rate. For difference measures, this happens if and only if the source is uniform, in 
agreement with the observation in [[8). However, in general p need not be uniform. 

IV. Gaussian Source with Quadratic Distortion Measure 

In this section we part with the assumption that the source is discrete. While the derivation 
of the excess distortion dispersion for general continuous -amplitude sources is left for 
future work, we solve the important special case of Gaussian source with MSE (quadratic) 
distortion measure. 

Let the source X be i.i.d. zero-mean Gaussian with variance a 2 . The distortion measure 
is given by: d(x,y) = (x — y) 2 . For D < a 2 , the quadratic-Gaussian RDF is given by: 



R(a 2 ,D) 



log 



cr 

2~~° \~D 

In this case, the excess distortion exponent © is given by ifTOl : 



F(R, 



a 



D) 



D 



(7" 



;2R 



- 1 - log 



D 



(7" 



2R 



2AR 



- 1 - 2AR 



(26) 



(27) 



where AR = R — R(a 2 , D). 

As in the finite alphabet case, we define R a 2 ,D,e( n ) to be the minimal code rate at length 
n s.t. the excess distortion probability is at most e. From the excess distortion exponent 
((271) it follows that R a 2 D e (n) -> R(a 2 , D) as n -> oo. 

We are interested in the behavior of R a 2 D e {n) as n grows. We show that the quadratic- 
Gaussian case behaves according to © just like the finite-alphabet one. Recalling Proposi- 
tion [21 one expects the dispersion constant to be 



V{a\D) 



d 2 
dR 2 



F(R,a 2 ,D) 



R=R(a 2 ,D) 



1 

2" 



It can also be shown that the value of \ can be obtained by a continuous version of ©. 
We now show that this is the case indeed. 

Theorem 2: Let e > be a given excess distortion probability. Then the rate R a 2 ,D,s{ n ) 
satisfies 

()[-) <R-R(a 2 ,D)-J^-Q- 1 (e)<^-\ogn + o(-) (28) 



n 



n 



2n y ' ~ 2n 

Proof outline: The proof is similar in spirit to the proof of Theorem [TJ where spheres 
take the part of types. The type class of types near the source distribution is analogous to 

a sphere with radius r, where r 2 is close to na 2 . 

For the achievability part, we define a "typical" sphere with radius ^na 2 {l + a n ) with 
a n — > as n — > oo. a n is chosen s.t. the probability that the source falls outside the 
sphere is exactly e, so our code needs to D-cover the entire sphere. Note that the radius is 
just over the typical radius of the source. We use a sphere covering result by Rogers ifTTl 
Theorem 3], and find a code that can D-cover the entire typical sphere with no more than 
cn 5//2 (a 2 (l + a n )/D) n ^ 2 reconstruction words for some constant c. By arguments similar 
to those used in the proof of Lemma CD we get a n = >j2/nQ~ 1 (e) + O (-), so the rate R 
is bounded according to (1281) . 



For the converse part, we follow the proof of the converse to the excess distortion exponent 
in |[T0l . We get that the excess distortion probability is lower bounded by the probability 
to leave a sphere that has a volume of e nR times the volume of a single P-ball around a 
reconstruction point. Again, using the Berry-Esseen theorem we connect excess distortion 
probability and the ratio of the radiuses, and get that the rate P is lower bounded according 
to d2Sb. 



V. Proof of the Rate Redundancy Lemma 

Proof of Lemma [7J- Let x be a source word with type P x , drawn from the source p. 
We prove the more general version of the lemma, with e + g n being the given probability. 
The relation between e and AP is given by 

e + g n = Pr {P(P X , D) > R P (D) + AR} . (29) 

By the regularity assumptions on P(p, D), we use the Taylor approximation and write 

L 

P(P X , D) = R(p, D) + ^(P x (i) - Pi )R'(i) + j(P x , P ), (30) 

8=1 

where R'(-) was defined in ©, and j(P x , p) is the correction term for the approximation. 
Equation (|29l ) now becomes 

e + g n = Pr | - P*) R '( l ) + 7(^x, p) > Aflj . (31) 

By the Taylor approximation theorem, and by the assumption of finite second derivatives 
of R(p,D), we have that the correction term 7(P X , p) = 0(||P X — p|| 2 ). This means that 
there exists a constant 77, s.t. for large enough n, 7(P x ,p) < rj\\P x — p|| 2 . By Lemma [2] 
there exists T = O (log n/n) s.t. Pr{7(P x ,p) > T} < ^. 

Using simple probability rules, for any random variables A and B and a constant c, we 
have that for any r 1; T 2 the following holds: 

(32) 
(33) 

In our case, we use (|32l) (resp. (|33l l) to show the upper (resp. lower) bound on AR. By 
selecting Ti = T 2 = T, we get 

e + g n < Pr |X!( P *W - P*W) > AP - r| + O , (34) 

e + ffn > Pr |^( p x(0 - ^)P'(^) > AP + r| - O . (35) 

Now consider the probability expression in (l34l : 
Pr|^(P x (i)- P i)^(i) > AP-rj =Yi\^^R'(x k )~Y^ Pl R'(i) > AR-A. 



Prj 




-P > c] 


(■ < Pr-j 


[A > c- 


^] 


[ +Pr{ 


Prj 




-P > c] 


(■ > Pr j 


> c + 


^] 


[-Pr{ 



- J2k=i R!{ x k) can be interpreted as an average of n i.i.d. random variables R'(X), whose 
expectation is given by E[R'{X)] = ^2 i= xPiR'{i)- Their variance is given by V(p,D), 
defined in ©. By the central limit theorem, the sum of i.i.d. random variables normalized 
by y/n converges to a Gaussian random variable as n grows. Specifically, by the Berry- 
Esseen theorem (see, e.g. [[[21 Ch. XVI. 5]), we get 

pr 1 7s it w**) - B i fl, ( A ')]) > - r ) I 

^ e ( (Afl - r) v / S) ± ^' <36) 

where £ = E \\R'(X) - E[R'(X))\ 3 ). By applying the same derivation AR + T, ([34]) and 
(1351) can be written together as 



e + 0| !^ Q ( (Afl± r)7=^). (37) 



By the smoothness of Q 1 (-) around e and the Taylor approximation we have 



AR= j}^£l Q - 1{e)+ o(^), (38) 



n \ n 

as required. ■ 
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